Skip to content

perf: reduce ConcurrentDictionary closure allocations in hot paths#5210

Merged
thomhurst merged 1 commit intomainfrom
perf/reduce-concurrent-dict-allocations
Mar 22, 2026
Merged

perf: reduce ConcurrentDictionary closure allocations in hot paths#5210
thomhurst merged 1 commit intomainfrom
perf/reduce-concurrent-dict-allocations

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

@thomhurst thomhurst commented Mar 22, 2026

Summary

  • TestExecutionGuard: Add TryGetValue fast path before GetOrAdd to avoid allocating a TaskCompletionSource<bool> on every call when a test is already executing
  • EventReceiverOrchestrator: Add TryGetValue fast paths before GetOrAdd calls for first-test-in-session/assembly/class tasks, avoiding closure allocations from captured context, cancellationToken, and other state on the hot path. Also added fast paths for assembly/class test count counters to reduce lock contention (these use static lambdas so the savings are contention-related, not closure-related)
  • HookDelegateBuilder: Mark GetCachedGenericTypeDefinition lambda as static to prevent implicit closure allocation

Rationale

Profiling shows ~4.1% exclusive CPU in ConcurrentDictionary operations (TryAddInternal, AcquireAllLocks). These changes reduce allocations and contention by:

  1. Checking TryGetValue before GetOrAdd so closures capturing state are never created on the fast path (when key already exists)
  2. Avoiding unnecessary TaskCompletionSource allocations in the execution guard
  3. Reducing lock contention on counter dictionaries that use static lambdas (already cached by the compiler, but GetOrAdd still acquires internal locks)
  4. Using static lambdas where the lambda body doesn't need captured state

No behavioral changes — only allocation and contention reduction. Thread safety guarantees are preserved since TryGetValue is lock-free on ConcurrentDictionary and the subsequent GetOrAdd handles the race correctly.

Test plan

  • Verify all existing tests pass (no behavioral changes)
  • Confirm build succeeds across all target frameworks (net8.0, net9.0, net10.0, netstandard2.0)

Add TryGetValue fast paths before GetOrAdd calls to avoid closure
allocations and unnecessary ConcurrentDictionary contention on the
hot path. Key changes:

- TestExecutionGuard: avoid allocating TaskCompletionSource when a
  test is already executing by checking TryGetValue first
- EventReceiverOrchestrator: add TryGetValue fast paths before
  GetOrAdd for first-test-in-session/assembly/class event tasks
  and for assembly/class test count counters, avoiding closure
  allocations from captured state
- HookDelegateBuilder: mark GetCachedGenericTypeDefinition lambda
  as static to prevent implicit closure allocation
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overall, this is a well-reasoned performance optimization PR. The changes are generally correct and the approach is sound. Here are my findings:

✅ — lambda

Clean, correct change. Adding is self-documenting intent and prevents any future accidental closure introduction if the lambda is modified. No concerns.


✅ — TryGetValue fast path

The fast path is correct. Worth noting the subtle thread-safety contract here: calls / before (in ), so any TCS retrieved via is guaranteed to be either:

  • Still in-progress (await will block until completion), or
  • Already completed (await returns immediately)

Either way, behaves correctly. The optimization is valid — it avoids allocating a on the common path when the test is already tracked.


✅ — First-test event task fast paths

This is the highest-impact change. The closures in do capture , //, and — so every call without the fast path allocates a new closure object on the heap. The bypass avoids this on the hot path (i.e., every test after the first in a session/assembly/class). Correct and well-targeted.


⚠️ Minor: PR description slightly overstates savings for counter dictionaries

In and , the pre-existing lambdas are . In C#, lambdas with no captures compile to cached static delegate fields — the delegate itself is never heap-allocated on repeated calls. So the bypass here does not save a closure/delegate allocation; it only saves the internal hash-table lookup and potential lock acquisition.

This is still a valid win under contention, but the PR description frames these as "closure allocation" savings when they're really "lock contention" savings. Not a correctness issue — just worth being precise for future readers.


🔍 — cold path concern (pre-existing, not introduced here)

The counter dicts (, ) are not cleared during , unlike the first-test task dicts. If is ever called more than once (e.g., for re-runs), the accumulates on top of existing values. This pre-dates this PR, but the refactored code now makes this more visible. Worth confirming the lifecycle contract — is always called exactly once, or should these counters also be reset?


Summary

The core optimization is sound and the implementation is correct. The -before- pattern is a well-established idiom for hot paths. The most impactful part is the first-test event closure avoidance, and the TCS avoidance is clean. The only non-trivial note is the potential double-count concern, which pre-exists this PR but is worth confirming.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Overall, this is a well-reasoned performance optimization PR. The changes are generally correct and the approach is sound. Here are my findings:

HookDelegateBuilder.cs — static lambda

Clean, correct change. Adding static is self-documenting intent and prevents any future accidental closure introduction if the lambda is modified. No concerns.


TestExecutionGuard.cs — TryGetValue fast path

The fast path is correct. Worth noting the subtle thread-safety contract here: ExecuteAndCompleteAsync calls SetResult/SetException before TryRemove (in finally), so any TCS retrieved via TryGetValue is guaranteed to be either in-progress (await blocks) or already completed (await returns immediately). Either way, WaitForExistingExecutionAsync behaves correctly.

The optimization is valid — it avoids allocating a TaskCompletionSource<bool> on the common path when the test is already tracked.


EventReceiverOrchestrator.cs — First-test event task fast paths

This is the highest-impact change. The GetOrAdd closures in InvokeFirstTestIn{Session,Assembly,Class}EventReceiversAsync do capture context, assemblyContext/classContext/sessionContext, and cancellationToken — so every call without the fast path allocates a new closure object on the heap. The TryGetValue bypass avoids this on the hot path (every test after the first in a session/assembly/class). Correct and well-targeted.


Minor: PR description slightly overstates savings for counter dictionaries

In InvokeLastTestIn{Assembly,Class}EventReceiversAsync and InitializeTestCounts, the pre-existing lambdas are static _ => new Counter(). In C#, static lambdas with no captures compile to cached static delegate fields — the delegate itself is never heap-allocated on repeated calls. So the TryGetValue bypass here does not save a closure/delegate allocation; it only saves the GetOrAdd internal hash-table lookup and potential lock acquisition under contention.

This is still a valid win, but the PR description frames these as "closure allocation" savings when they're really "lock contention" savings. Not a correctness issue — just worth being precise for future readers.


InitializeTestCounts — cold path concern (pre-existing, not introduced here)

The counter dicts (_assemblyTestCounts, _classTestCounts) are not cleared during InitializeTestCounts, unlike the first-test task dicts. If InitializeTestCounts is ever called more than once (e.g., re-runs), counter.Add(group.Count()) accumulates on top of existing values. This pre-dates this PR, but the refactored pattern makes it more visible. Worth confirming the lifecycle contract — is InitializeTestCounts always called exactly once, or should these counters also be reset?


Summary

The core optimization is sound and the implementation is correct. The TryGetValue-before-GetOrAdd pattern is a well-established idiom for ConcurrentDictionary hot paths. The most impactful change is the first-test event closure avoidance; the TestExecutionGuard TCS avoidance and the static lambda are clean wins too. The only non-trivial note is a pre-existing potential double-count concern in InitializeTestCounts worth confirming.

@thomhurst thomhurst enabled auto-merge (squash) March 22, 2026 12:29
@thomhurst thomhurst merged commit 4961754 into main Mar 22, 2026
15 of 16 checks passed
@thomhurst thomhurst deleted the perf/reduce-concurrent-dict-allocations branch March 22, 2026 12:34
return new ValueTask(existingTask);
}

var task = _firstTestInSessionTasks.GetOrAdd("session",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomhurst FYI there is also an overload for this, which internally does almost exactly the same, see ConcurrentDictionary.cs:1223.

var task = _firstTestInSessionTasks.GetOrAdd("session",
static (_, args) => InvokeFirstTestInSessionEventReceiversCoreAsync(args.Item1, args.Item2, args.Item2),
(context, sessionContext, cancellationToken));
        return new ValueTask(task);

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hartmair !

This was referenced Mar 22, 2026
This was referenced Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants